NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Exact Policy Recovery in Offline RL with Both Heavy-Tailed Rewards and Data Corruption

https://doi.org/10.1609/aaai.v38i10.29022

Chen, Yiding; Zhang, Xuezhou; Xie, Qiaomin; Zhu, Xiaojin (March 2024, Proceedings of the AAAI Conference on Artificial Intelligence)

We study offline reinforcement learning (RL) with heavy-tailed reward distribution and data corruption: (i) Moving beyond subGaussian reward distribution, we allow the rewards to have infinite variances; (ii) We allow corruptions where an attacker can arbitrarily modify a small fraction of the rewards and transitions in the dataset. We first derive a sufficient optimality condition for generalized Pessimistic Value Iteration (PEVI), which allows various estimators with proper confidence bounds and can be applied to multiple learning settings. In order to handle the data corruption and heavy-tailed reward setting, we prove that the trimmed-mean estimation achieves the minimax optimal error rate for robust mean estimation under heavy-tailed distributions. In the PEVI algorithm, we plug in the trimmed mean estimation and the confidence bound to solve the robust offline RL problem. Standard analysis reveals that data corruption induces a bias term in the suboptimality gap, which gives the false impression that any data corruption prevents optimal policy learning. By using the optimality condition for the generalized PEVI, we show that as long as the bias term is less than the ``action gap'', the policy returned by PEVI achieves the optimal value given sufficient data.
more » « less
Full Text Available
Provable Benefits of Representational Transfer in Reinforcement Learning

Agarwal, Alekh; Song, Yuda; Sun, Wen; Wang, Kaiwen; Wang, Mengdi; Zhang, Xuezhou (July 2023, The Conference on Learning Theory)

Full Text Available
Corruption-robust offline reinforcement learning

Zhang, Xuezhou; Chen, Yiding; Zhu, Xiaojin; Sun, Wen (January 2022, The 25th International Conference on Artificial Intelligence and Statistics)

Full Text Available
Provable defense against backdoor policies in reinforcement learning

Bharti, Shubham; Zhang, Xuezhou; Singla, Adish; Zhu, Xiaojin (January 2022, Advances in Neural Information Processing Systems)

Full Text Available
Robust policy gradient against strong data corruption

Zhang, Xuezhou; Chen, Yiding; Zhu, Xiaojin; Sun, Wen (January 2021, International Conference on Machine Learning (ICML))
null (Ed.)
Full Text Available
Online Data Poisoning Attacks

Zhang, Xuezhou; Zhu, Xiaojin; Lessard, Laurent (January 2020, Conference on Learning for Dynamics and Control)
null (Ed.)
Full Text Available
Online Data Poisoning Attacks

Zhang, Xuezhou; Zhu, Xiaojin; Lessard, Laurent (January 2020, Proceedings of the 2nd Conference on Learning for Dynamics and Control)

Full Text Available
Adaptive Reward-Poisoning Attacks against Reinforcement Learning

Zhang, Xuezhou; Ma, Yuzhe; Singla, Adish; Zhu, Xiaojin (January 2020, International Conference on Machine Learning)

Full Text Available
Adaptive reward-poisoning attacks against reinforcement learning

Zhang, Xuezhou; Ma, Yuzhe; Singla, Adish; Zhu, Xiaojin (January 2020, International Conference on Machine Learning)
null (Ed.)
Full Text Available
An Optimal Control Approach to Sequential Machine Teaching

Lessard, Laurent; Zhang, Xuezhou; Zhu, Xiaojin (January 2019, International Conference on Artificial Intelligence and Statistics)

Full Text Available

« Prev Next »

Search for: All records